Makassar
Indonesian rescuers find wreckage of plane that had 11 people on board
Indonesian rescuers have recovered wreckage from a missing plane that is believed to have crashed with 11 people on board while approaching a mountainous region on Sulawesi island during cloudy conditions. The discovery on Sunday comes after the small plane - on its way from Yogyakarta on Indonesia's main island of Java to Makassar, the capital city of South Sulawesi province - vanished from radar on Saturday. Rescuers on the ground then retrieved larger debris consistent with the main fuselage and tail scattered on a steep northern slope, Anwar told a news conference. "The discovery of the aircraft's main sections significantly narrows the search zone and offers a crucial clue for tightening the search area," Anwar said. "Our joint search and rescue teams are now focusing on searching for the victims, especially those who might still be alive." The plane, a turboprop ATR 42-500, was operated by Indonesia Air Transport and was last tracked in the Leang-Leang area of Maros, a mountainous district of South Sulawesi province.
Indonesia searches for missing plane with at least 10 on board
Indonesian authorities are searching for a plane carrying three government workers and at least seven crew members after contact with the aircraft was lost, officials said. The fisheries surveillance aircraft had been heading to Makassar, the capital of South Sulawesi, after departing from Yogyakarta Province, before contact was lost, Andi Sultan, operations chief at the Makassar search and rescue agency, told the news agency Reuters. He declined to comment on the possible cause of the incident. Maritime affairs and fisheries minister Sakti Wahyu Trenggono told a news conference on Saturday that three employees from his ministry were on board the plane, which was operated by Indonesia Air Transport. Reports on the number of crew members varied.
Pigs have been island hopping for 50,000 years
With human help, the mammals can defy'the world's most fundamental natural boundaries.' Breakthroughs, discoveries, and DIY tips sent every weekday. Despite not exactly being world-renowned swimmers, pigs have spread across the Asia-Pacific region for thousands of years . With the genetic and archeological data from over 700 pigs, a team of scientists documented how people helped the mammals make their way across thousands of miles. "This research reveals what happens when people transport animals enormous distances, across one of the world's most fundamental natural boundaries," evolutionary geneticist and study co-author author Dr. David Stanton of the University of Cardiff and Queen Mary University of London said in a statement. "These movements led to pigs with a melting pot of ancestries. These patterns were technically very difficult to disentangle, but have ultimately helped us understand how and why animals came to be distributed across the Pacific islands."
NusaAksara: A Multimodal and Multilingual Benchmark for Preserving Indonesian Indigenous Scripts
Adilazuarda, Muhammad Farid, Wijanarko, Musa Izzanardi, Susanto, Lucky, Nur'aini, Khumaisa, Wijaya, Derry, Aji, Alham Fikri
Indonesia is rich in languages and scripts. However, most NLP progress has been made using romanized text. In this paper, we present NusaAksara, a novel public benchmark for Indonesian languages that includes their original scripts. Our benchmark covers both text and image modalities and encompasses diverse tasks such as image segmentation, OCR, transliteration, translation, and language identification. Our data is constructed by human experts through rigorous steps. NusaAksara covers 8 scripts across 7 languages, including low-resource languages not commonly seen in NLP benchmarks. Although unsupported by Unicode, the Lampung script is included in this dataset. We benchmark our data across several models, from LLMs and VLMs such as GPT-4o, Llama 3.2, and Aya 23 to task-specific systems such as PP-OCR and LangID, and show that most NLP technologies cannot handle Indonesia's local scripts, with many achieving near-zero performance.
NERsocial: Efficient Named Entity Recognition Dataset Construction for Human-Robot Interaction Utilizing RapidNER
Atuhurra, Jesse, Kamigaito, Hidetaka, Ouchi, Hiroki, Shindo, Hiroyuki, Watanabe, Taro
Adapting named entity recognition (NER) methods to new domains poses significant challenges. We introduce RapidNER, a framework designed for the rapid deployment of NER systems through efficient dataset construction. RapidNER operates through three key steps: (1) extracting domain-specific sub-graphs and triples from a general knowledge graph, (2) collecting and leveraging texts from various sources to build the NERsocial dataset, which focuses on entities typical in human-robot interaction, and (3) implementing an annotation scheme using Elasticsearch (ES) to enhance efficiency. NERsocial, validated by human annotators, includes six entity types, 153K tokens, and 99.4K sentences, demonstrating RapidNER's capability to expedite dataset creation.
Artificial Intelligence Based Navigation in Quasi Structured Environment
Kumar, Hariram Sampath, Singh, Archana, Ojha, Manish Kumar
The proper planning of different types of public transportation such as metro, highway, waterways, and so on, can increase the efficiency, reduce the congestion and improve the safety of the country. There are certain challenges associated with route planning, such as high cost of implementation, need for adequate resource & infrastructure and resistance to change. The goal of this research is to examine the working, applications, complexity factors, advantages & disadvantages of Floyd- Warshall, Bellman-Ford, Johnson, Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), & Grey Wolf Optimizer (GWO), to find the best choice for the above application. In this paper, comparative analysis of above-mentioned algorithms is presented. The Floyd-Warshall method and ACO algorithm are chosen based on the comparisons. Also, a combination of modified Floyd-Warshall with ACO algorithm is proposed. The proposed algorithm showed better results with less time complexity, when applied on randomly structured points within a boundary called quasi-structured points. In addition, this paper also discusses the future works of integrating Floyd-Warshall with ACO to develop a real-time model for overcoming above mentioned-challenges during transportation route planning.
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Lovenia, Holy, Mahendra, Rahmad, Akbar, Salsabil Maulana, Miranda, Lester James V., Santoso, Jennifer, Aco, Elyanah, Fadhilah, Akhdan, Mansurov, Jonibek, Imperial, Joseph Marvin, Kampman, Onno P., Moniz, Joel Ruben Antony, Habibi, Muhammad Ravi Shulthan, Hudi, Frederikus, Montalan, Railey, Ignatius, Ryan, Lopo, Joanito Agili, Nixon, William, Karlsson, Börje F., Jaya, James, Diandaru, Ryandito, Gao, Yuze, Amadeus, Patrick, Wang, Bin, Cruz, Jan Christian Blaise, Whitehouse, Chenxi, Parmonangan, Ivan Halim, Khelli, Maria, Zhang, Wenyu, Susanto, Lucky, Ryanda, Reynard Adha, Hermawan, Sonny Lazuardi, Velasco, Dan John, Kautsar, Muhammad Dehan Al, Hendria, Willy Fitra, Moslem, Yasmin, Flynn, Noah, Adilazuarda, Muhammad Farid, Li, Haochen, Lee, Johanes, Damanhuri, R., Sun, Shuo, Qorib, Muhammad Reza, Djanibekov, Amirbek, Leong, Wei Qi, Do, Quyet V., Muennighoff, Niklas, Pansuwan, Tanrada, Putra, Ilham Firdausi, Xu, Yan, Tai, Ngee Chia, Purwarianti, Ayu, Ruder, Sebastian, Tjhi, William, Limkonchotiwat, Peerat, Aji, Alham Fikri, Keh, Sedrick, Winata, Genta Indra, Zhang, Ruochen, Koto, Fajri, Yong, Zheng-Xin, Cahyawijaya, Samuel
Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, we introduce SEACrowd, a collaborative initiative that consolidates a comprehensive resource hub that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in SEA.
IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language
Susanto, Lucky, Wijanarko, Musa Izzanardi, Pratama, Prasetia Anugrah, Hong, Traci, Idris, Ika, Aji, Alham Fikri, Wijaya, Derry
Hate speech poses a significant threat to social harmony. Over the past two years, Indonesia has seen a ten-fold increase in the online hate speech ratio, underscoring the urgent need for effective detection mechanisms. However, progress is hindered by the limited availability of labeled data for Indonesian texts. The condition is even worse for marginalized minorities, such as Shia, LGBTQ, and other ethnic minorities because hate speech is underreported and less understood by detection tools. Furthermore, the lack of accommodation for subjectivity in current datasets compounds this issue. To address this, we introduce IndoToxic2024, a comprehensive Indonesian hate speech and toxicity classification dataset. Comprising 43,692 entries annotated by 19 diverse individuals, the dataset focuses on texts targeting vulnerable groups in Indonesia, specifically during the hottest political event in the country: the presidential election. We establish baselines for seven binary classification tasks, achieving a macro-F1 score of 0.78 with a BERT model (IndoBERTweet) fine-tuned for hate speech classification. Furthermore, we demonstrate how incorporating demographic information can enhance the zero-shot performance of the large language model, gpt-3.5-turbo. However, we also caution that an overemphasis on demographic information can negatively impact the fine-tuned model performance due to data fragmentation.
Generative AI: The power of the new education
Altares-López, Sergio, Bengochea-Guevara, José M., Ranz, Carlos, Montes, Héctor, Ribeiro, Angela
The effective integration of generative artificial intelligence in education is a fundamental aspect to prepare future generations. This study proposes an accelerated learning methodology in artificial intelligence, focused on its generative capacity, as a way to achieve this goal. It recognizes the challenge of getting teachers to engage with new technologies and adapt their methods in all subjects, not just those related to AI. This methodology not only promotes interest in science, technology, engineering and mathematics, but also facilitates student understanding of the ethical uses and risks associated with AI. Students' perceptions of generative AI are examined, addressing their emotions towards its evolution, evaluation of its ethical implications, and everyday use of AI tools. In addition, AI applications commonly used by students and their integration into other disciplines are investigated. The study aims to provide educators with a deeper understanding of students' perceptions of AI and its relevance in society and in their future career paths.
Constructing and Expanding Low-Resource and Underrepresented Parallel Datasets for Indonesian Local Languages
Lopo, Joanito Agili, Tanone, Radius
In Indonesia, local languages play an integral role in the culture. However, the available Indonesian language resources still fall into the category of limited data in the Natural Language Processing (NLP) field. This is become problematic when build NLP model for these languages. To address this gap, we introduce Bhinneka Korpus, a multilingual parallel corpus featuring five Indonesian local languages. Our goal is to enhance access and utilization of these resources, extending their reach within the country. We explained in a detail the dataset collection process and associated challenges. Additionally, we experimented with translation task using the IBM Model 1 due to data constraints. The result showed that the performance of each language already shows good indications for further development. Challenges such as lexical variation, smoothing effects, and cross-linguistic variability are discussed. We intend to evaluate the corpus using advanced NLP techniques for low-resource languages, paving the way for multilingual translation models.